- Title
- Contextualizing the current state of research on the use of machine learning for student performance prediction: A systematic literature review
- Creator
- Alalawi, Khalid; Athauda, Rukshan; Chiong, Raymond
- Relation
- Engineering Reports Vol. 5, Issue 12, no. e12699
- Publisher Link
- http://dx.doi.org/10.1002/eng2.12699
- Publisher
- Wiley
- Resource Type
- journal article
- Date
- 2023
- Description
- Today, educational institutions produce large amounts of data with the deployment of learning management systems. These large datasets provide an untapped potential to support and enhance decision‐making and operations. In recent times, machine learning (ML) has been applied to develop models utilizing this “big” data to assist in decision‐making. This study presents a systematic literature review into the application of ML to predict student performance. A total of 162 research articles from January 2010 to October 2022 were critically reviewed and analyzed by applying Kitchenham's systematic literature review approach. Our analysis categorized the literature predicting students' academic performance into two categories: (i) predicting student performance in assessments, courses or programs, and identifying students at‐risk of failing their course/program (129 studies); and (ii) predicting student dropout or retention in a course or program (33 studies). Classification is the most commonly used approach for predicting student performance (138 studies), followed by regression (25 studies) and clustering (9 studies). Supervised learning methods are used more often than semi‐supervised learning. Five most popular ML algorithms include the Decision Tree, Random Forest, Naïve Bayes, Artificial Neural Network, and Support Vector Machine. Historical records of students' grades and class performance, academic related data from learning management systems, and students' demographics are the most common features used for predicting students' performance. The most common methods used for feature selection are Information Gain‐based selection algorithms, Correlation‐based feature selection, and Gain Ratio. The general platforms/tools/libraries used in the studies include WEKA, Python, R, Rapid Miner, and MATLAB. We also investigated possible actions considered in the literature to help at‐risk students. We only found very few studies that deployed remedial actions and evaluated their impact on students' performance. In conclusion, ML has shown great potential in the prediction of student performance, but also has many areas of further research.
- Subject
- classification; clustering; educational data mining; machine learning,; prediction; regression
- Identifier
- http://hdl.handle.net/1959.13/1502030
- Identifier
- uon:55197
- Identifier
- ISSN:2577-8196
- Rights
- © 2023 The Authors. This is an open access article under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
- Language
- eng
- Full Text
- Reviewed
- Hits: 1069
- Visitors: 1048
- Downloads: 31
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT02 | Publisher version (open access) | 2 MB | Adobe Acrobat PDF | View Details Download |